{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "dd9c19f5bdaa4b9da051dc46e70b6854": {
          "model_module": "@jupyter-widgets/output",
          "model_name": "OutputModel",
          "model_module_version": "1.0.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/output",
            "_model_module_version": "1.0.0",
            "_model_name": "OutputModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/output",
            "_view_module_version": "1.0.0",
            "_view_name": "OutputView",
            "layout": "IPY_MODEL_fb490fd1852c454f854c8a666fdf172f",
            "msg_id": "",
            "outputs": [
              {
                "output_type": "display_data",
                "data": {
                  "text/plain": "  ✨ You're running DeepEval's latest \u001b[38;2;106;0;255mSupport Email Quality [Arena GEval] Metric\u001b[0m! \u001b[38;2;55;65;81m(using gpt-5, async_mode=True)...\u001b[0m\n",
                  "text/html": "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">  ✨ You're running DeepEval's latest <span style=\"color: #6a00ff; text-decoration-color: #6a00ff\">Support Email Quality [Arena GEval] Metric</span>! <span style=\"color: #374151; text-decoration-color: #374151\">(using gpt-5, async_mode=True)...</span>\n</pre>\n"
                },
                "metadata": {}
              }
            ]
          }
        },
        "fb490fd1852c454f854c8a666fdf172f": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        }
      }
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# LLM Arena-as-a-Judge\n",
        "In this tutorial, we will explore how to implement the LLM Arena-as-a-Judge approach to evaluate large language model outputs. Instead of assigning isolated numerical scores to each response, this method performs head-to-head comparisons between outputs to determine which one is better — based on criteria you define, such as helpfulness, clarity, or tone.\n",
        "\n",
        "We'll use OpenAI's `GPT-4.1` and `Gemini 2.5 Pro` to generate responses, and leverage `GPT-5` as the judge to evaluate their outputs. For demonstration, we’ll work with a simple email support scenario, where the context is as follows:\n",
        "\n",
        "```\n",
        "Dear Support,  \n",
        "I ordered a wireless mouse last week, but I received a keyboard instead.  \n",
        "Can you please resolve this as soon as possible?  \n",
        "Thank you,  \n",
        "John  \n",
        "```\n",
        "\n"
      ],
      "metadata": {
        "id": "Q83nnEAmy-8Z"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Installing the dependencies"
      ],
      "metadata": {
        "id": "76yF6Br7zueh"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "collapsed": true,
        "id": "_Q6F8Cm0hUxl",
        "outputId": "11997813-7e53-4558-f05c-dcc6e09163d6"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Requirement already satisfied: deepeval in /usr/local/lib/python3.12/dist-packages (3.4.0)\n",
            "Requirement already satisfied: google-genai in /usr/local/lib/python3.12/dist-packages (1.30.0)\n",
            "Requirement already satisfied: openai in /usr/local/lib/python3.12/dist-packages (1.100.0)\n",
            "Requirement already satisfied: aiohttp in /usr/local/lib/python3.12/dist-packages (from deepeval) (3.12.15)\n",
            "Requirement already satisfied: anthropic in /usr/local/lib/python3.12/dist-packages (from deepeval) (0.64.0)\n",
            "Requirement already satisfied: click<8.3.0,>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (8.2.1)\n",
            "Requirement already satisfied: grpcio<2.0.0,>=1.67.1 in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.74.0)\n",
            "Requirement already satisfied: nest_asyncio in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.6.0)\n",
            "Requirement already satisfied: ollama in /usr/local/lib/python3.12/dist-packages (from deepeval) (0.5.3)\n",
            "Requirement already satisfied: opentelemetry-api<2.0.0,>=1.24.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.36.0)\n",
            "Requirement already satisfied: opentelemetry-exporter-otlp-proto-grpc<2.0.0,>=1.24.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.36.0)\n",
            "Requirement already satisfied: opentelemetry-sdk<2.0.0,>=1.24.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.36.0)\n",
            "Requirement already satisfied: portalocker in /usr/local/lib/python3.12/dist-packages (from deepeval) (3.2.0)\n",
            "Requirement already satisfied: posthog<7.0.0,>=6.3.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (6.6.1)\n",
            "Requirement already satisfied: pyfiglet in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.0.4)\n",
            "Requirement already satisfied: pytest in /usr/local/lib/python3.12/dist-packages (from deepeval) (8.4.1)\n",
            "Requirement already satisfied: pytest-asyncio in /usr/local/lib/python3.12/dist-packages (from deepeval) (1.1.0)\n",
            "Requirement already satisfied: pytest-repeat in /usr/local/lib/python3.12/dist-packages (from deepeval) (0.9.4)\n",
            "Requirement already satisfied: pytest-rerunfailures<13.0,>=12.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (12.0)\n",
            "Requirement already satisfied: pytest-xdist in /usr/local/lib/python3.12/dist-packages (from deepeval) (3.8.0)\n",
            "Requirement already satisfied: requests<3.0.0,>=2.31.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (2.32.4)\n",
            "Requirement already satisfied: rich<15.0.0,>=13.6.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (13.9.4)\n",
            "Requirement already satisfied: sentry-sdk in /usr/local/lib/python3.12/dist-packages (from deepeval) (2.35.0)\n",
            "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from deepeval) (75.2.0)\n",
            "Requirement already satisfied: tabulate<0.10.0,>=0.9.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (0.9.0)\n",
            "Requirement already satisfied: tenacity<=10.0.0,>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from deepeval) (8.5.0)\n",
            "Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /usr/local/lib/python3.12/dist-packages (from deepeval) (4.67.1)\n",
            "Requirement already satisfied: typer<1.0.0,>=0.9 in /usr/local/lib/python3.12/dist-packages (from deepeval) (0.16.0)\n",
            "Requirement already satisfied: wheel in /usr/local/lib/python3.12/dist-packages (from deepeval) (0.45.1)\n",
            "Requirement already satisfied: anyio<5.0.0,>=4.8.0 in /usr/local/lib/python3.12/dist-packages (from google-genai) (4.10.0)\n",
            "Requirement already satisfied: google-auth<3.0.0,>=2.14.1 in /usr/local/lib/python3.12/dist-packages (from google-genai) (2.38.0)\n",
            "Requirement already satisfied: httpx<1.0.0,>=0.28.1 in /usr/local/lib/python3.12/dist-packages (from google-genai) (0.28.1)\n",
            "Requirement already satisfied: pydantic<3.0.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from google-genai) (2.11.7)\n",
            "Requirement already satisfied: websockets<15.1.0,>=13.0.0 in /usr/local/lib/python3.12/dist-packages (from google-genai) (15.0.1)\n",
            "Requirement already satisfied: typing-extensions<5.0.0,>=4.11.0 in /usr/local/lib/python3.12/dist-packages (from google-genai) (4.14.1)\n",
            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from openai) (1.9.0)\n",
            "Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from openai) (0.10.0)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from openai) (1.3.1)\n",
            "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.12/dist-packages (from anyio<5.0.0,>=4.8.0->google-genai) (3.10)\n",
            "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from google-auth<3.0.0,>=2.14.1->google-genai) (5.5.2)\n",
            "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from google-auth<3.0.0,>=2.14.1->google-genai) (0.4.2)\n",
            "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.12/dist-packages (from google-auth<3.0.0,>=2.14.1->google-genai) (4.9.1)\n",
            "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0,>=0.28.1->google-genai) (2025.8.3)\n",
            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1.0.0,>=0.28.1->google-genai) (1.0.9)\n",
            "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1.0.0,>=0.28.1->google-genai) (0.16.0)\n",
            "Requirement already satisfied: importlib-metadata<8.8.0,>=6.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-api<2.0.0,>=1.24.0->deepeval) (8.7.0)\n",
            "Requirement already satisfied: googleapis-common-protos~=1.57 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc<2.0.0,>=1.24.0->deepeval) (1.70.0)\n",
            "Requirement already satisfied: opentelemetry-exporter-otlp-proto-common==1.36.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc<2.0.0,>=1.24.0->deepeval) (1.36.0)\n",
            "Requirement already satisfied: opentelemetry-proto==1.36.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc<2.0.0,>=1.24.0->deepeval) (1.36.0)\n",
            "Requirement already satisfied: protobuf<7.0,>=5.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-proto==1.36.0->opentelemetry-exporter-otlp-proto-grpc<2.0.0,>=1.24.0->deepeval) (5.29.5)\n",
            "Requirement already satisfied: opentelemetry-semantic-conventions==0.57b0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-sdk<2.0.0,>=1.24.0->deepeval) (0.57b0)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from posthog<7.0.0,>=6.3.0->deepeval) (1.17.0)\n",
            "Requirement already satisfied: python-dateutil>=2.2 in /usr/local/lib/python3.12/dist-packages (from posthog<7.0.0,>=6.3.0->deepeval) (2.9.0.post0)\n",
            "Requirement already satisfied: backoff>=1.10.0 in /usr/local/lib/python3.12/dist-packages (from posthog<7.0.0,>=6.3.0->deepeval) (2.2.1)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->google-genai) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->google-genai) (2.33.2)\n",
            "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.0.0->google-genai) (0.4.1)\n",
            "Requirement already satisfied: packaging>=17.1 in /usr/local/lib/python3.12/dist-packages (from pytest-rerunfailures<13.0,>=12.0->deepeval) (25.0)\n",
            "Requirement already satisfied: iniconfig>=1 in /usr/local/lib/python3.12/dist-packages (from pytest->deepeval) (2.1.0)\n",
            "Requirement already satisfied: pluggy<2,>=1.5 in /usr/local/lib/python3.12/dist-packages (from pytest->deepeval) (1.6.0)\n",
            "Requirement already satisfied: pygments>=2.7.2 in /usr/local/lib/python3.12/dist-packages (from pytest->deepeval) (2.19.2)\n",
            "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.31.0->deepeval) (3.4.3)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.31.0->deepeval) (2.5.0)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich<15.0.0,>=13.6.0->deepeval) (4.0.0)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer<1.0.0,>=0.9->deepeval) (1.5.4)\n",
            "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (2.6.1)\n",
            "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (1.4.0)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (25.3.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (1.7.0)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (6.6.4)\n",
            "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (0.3.2)\n",
            "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp->deepeval) (1.20.1)\n",
            "Requirement already satisfied: execnet>=2.1 in /usr/local/lib/python3.12/dist-packages (from pytest-xdist->deepeval) (2.1.1)\n",
            "Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.12/dist-packages (from importlib-metadata<8.8.0,>=6.0->opentelemetry-api<2.0.0,>=1.24.0->deepeval) (3.23.0)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich<15.0.0,>=13.6.0->deepeval) (0.1.2)\n",
            "Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.0,>=2.14.1->google-genai) (0.6.1)\n"
          ]
        }
      ],
      "source": [
        "!pip install deepeval google-genai openai"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "In this tutorial, you’ll need API keys from both OpenAI and Google.\n",
        "\n",
        "* Google API Key: Visit https://aistudio.google.com/apikey\n",
        " to generate your key.\n",
        "\n",
        "* OpenAI API Key: Go to https://platform.openai.com/settings/organization/api-keys\n",
        " and create a new key. If you’re a new user, you may need to add billing information and make a minimum payment of $5 to activate API access.\n",
        "\n",
        "Since we’re using Deepeval for evaluation, the OpenAI API key is required"
      ],
      "metadata": {
        "id": "FpnM64Ydz5Jz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "os.environ[\"OPENAI_API_KEY\"] = getpass('Enter OpenAI API Key: ')\n",
        "os.environ['GOOGLE_API_KEY'] = getpass('Enter Google API Key: ')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "w8dXr5r8he8m",
        "outputId": "686c681d-53e7-478b-fa5e-356b85ae089c"
      },
      "execution_count": 82,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Enter OpenAI API Key: ··········\n",
            "Enter Google API Key: ··········\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Defining the context\n",
        "Next, we’ll define the context for our test case. In this example, we’re working with a customer support scenario where a user reports receiving the wrong product. We’ll create a context_email containing the original message from the customer and then build a prompt to generate a response based on that context."
      ],
      "metadata": {
        "id": "NpKZFlQg0imR"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from deepeval.test_case import ArenaTestCase, LLMTestCase, LLMTestCaseParams\n",
        "from deepeval.metrics import ArenaGEval\n",
        "\n",
        "context_email = \"\"\"\n",
        "Dear Support,\n",
        "I ordered a wireless mouse last week, but I received a keyboard instead.\n",
        "Can you please resolve this as soon as possible?\n",
        "Thank you,\n",
        "John\n",
        "\"\"\"\n",
        "\n",
        "prompt = f\"\"\"\n",
        "{context_email}\n",
        "--------\n",
        "\n",
        "Q: Write a response to the customer email above.\n",
        "\"\"\""
      ],
      "metadata": {
        "id": "abAutdIbjMCg"
      },
      "execution_count": 72,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## OpenAI Model Response"
      ],
      "metadata": {
        "id": "Jk9fZMQ20myf"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from openai import OpenAI\n",
        "client = OpenAI()\n",
        "\n",
        "def get_openai_response(prompt: str, model: str = \"gpt-4.1\") -> str:\n",
        "    response = client.chat.completions.create(\n",
        "        model=model,\n",
        "        messages=[\n",
        "            {\"role\": \"user\", \"content\": prompt}\n",
        "        ]\n",
        "    )\n",
        "    return response.choices[0].message.content"
      ],
      "metadata": {
        "id": "IhI4RmCxnHK8"
      },
      "execution_count": 58,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "openAI_response = get_openai_response(prompt=prompt)"
      ],
      "metadata": {
        "id": "D934qy2ytv_L"
      },
      "execution_count": 60,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Gemini Model Response"
      ],
      "metadata": {
        "id": "YJddAa4E0pmh"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from google import genai\n",
        "client = genai.Client()\n",
        "\n",
        "def get_gemini_response(prompt, model=\"gemini-2.5-pro\"):\n",
        "    response = client.models.generate_content(\n",
        "        model=model,\n",
        "        contents=prompt\n",
        "    )\n",
        "    return response.text"
      ],
      "metadata": {
        "id": "RzgetjXnoh4M"
      },
      "execution_count": 62,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "geminiResponse = get_gemini_response(prompt=prompt)"
      ],
      "metadata": {
        "id": "OUbDhHl9t6f9"
      },
      "execution_count": 67,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Defining the Arena Test Case\n",
        "Here, we set up the ArenaTestCase to compare the outputs of two models — GPT-4 and Gemini — for the same input prompt. Both models receive the same context_email, and their generated responses are stored in openAI_response and geminiResponse for evaluation."
      ],
      "metadata": {
        "id": "LXNiGLIY2F0t"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "a_test_case = ArenaTestCase(\n",
        "    contestants={\n",
        "        \"GPT-4\": LLMTestCase(\n",
        "            input=\"Write a response to the customer email above.\",\n",
        "            context=[context_email],\n",
        "            actual_output=openAI_response,\n",
        "        ),\n",
        "        \"Gemini\": LLMTestCase(\n",
        "            input=\"Write a response to the customer email above.\",\n",
        "            context=[context_email],\n",
        "            actual_output=geminiResponse,\n",
        "        ),\n",
        "    },\n",
        ")"
      ],
      "metadata": {
        "id": "delvKVcVldje"
      },
      "execution_count": 77,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Setting Up the Evaluation Metric\n",
        "Here, we define the ArenaGEval metric named `Support Email Quality`. The evaluation focuses on empathy, professionalism, and clarity — aiming to identify the response that is understanding, polite, and concise. The evaluation considers the context, input, and model outputs, using GPT-5 as the evaluator with verbose logging enabled for better insights."
      ],
      "metadata": {
        "id": "o6TNYiJ62PKq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "metric = ArenaGEval(\n",
        "    name=\"Support Email Quality\",\n",
        "    criteria=(\n",
        "        \"Select the response that best balances empathy, professionalism, and clarity. \"\n",
        "        \"It should sound understanding, polite, and be succinct.\"\n",
        "    ),\n",
        "    evaluation_params=[\n",
        "        LLMTestCaseParams.CONTEXT,\n",
        "        LLMTestCaseParams.INPUT,\n",
        "        LLMTestCaseParams.ACTUAL_OUTPUT,\n",
        "    ],\n",
        "    model=\"gpt-5\",\n",
        "    verbose_mode=True\n",
        ")"
      ],
      "metadata": {
        "id": "Id1TKJ4ll7vL"
      },
      "execution_count": 78,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Running the Evaluation"
      ],
      "metadata": {
        "id": "jyE3_uIq2kME"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "metric.measure(a_test_case)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 528,
          "referenced_widgets": [
            "dd9c19f5bdaa4b9da051dc46e70b6854",
            "fb490fd1852c454f854c8a666fdf172f"
          ]
        },
        "id": "HZaT6ndlmD4M",
        "outputId": "4c88b5a0-234c-482d-da82-67a7aa55fb18"
      },
      "execution_count": 79,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Output()"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "dd9c19f5bdaa4b9da051dc46e70b6854"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "**************************************************\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">**************************************************\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Support Email Quality [Arena GEval] Verbose Logs\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Support Email Quality [Arena GEval] Verbose Logs\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "**************************************************\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">**************************************************\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Criteria:\n",
              "Select the response that best balances empathy, professionalism, and clarity. It should sound understanding, \n",
              "polite, and be succinct. \n",
              " \n",
              "Evaluation Steps:\n",
              "[\n",
              "    \"From the Context and Input, identify the user’s intent, needs, tone, and any constraints or specifics to be \n",
              "addressed.\",\n",
              "    \"Verify the Actual Output directly responds to the Input, uses relevant details from the Context, and remains \n",
              "consistent with any constraints.\",\n",
              "    \"Evaluate empathy: check whether the Actual Output acknowledges the user’s situation/feelings from the \n",
              "Context/Input in a polite, understanding way.\",\n",
              "    \"Evaluate professionalism and clarity: ensure respectful, blame-free tone and concise, easy-to-understand \n",
              "wording; choose the response that best balances empathy, professionalism, and succinct clarity.\"\n",
              "] \n",
              " \n",
              "Winner: GPT-4\n",
              " \n",
              "Reason: GPT-4 delivers a single, concise, and professional email that directly addresses the context (acknowledges \n",
              "receiving a keyboard instead of the ordered wireless mouse), apologizes, and clearly outlines next steps (send the \n",
              "correct mouse and provide return instructions) with a polite verification step (requesting a photo). This best \n",
              "matches the request to write a response and balances empathy and clarity. In contrast, Gemini includes multiple \n",
              "options with meta commentary, which dilutes focus and fails to provide one clear reply; while empathetic and \n",
              "detailed (e.g., acknowledging frustration and offering prepaid labels), the multi-option format and an \n",
              "over-assertive claim of already locating the order reduce professionalism and succinct clarity compared to GPT-4.\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Criteria:\n",
              "Select the response that best balances empathy, professionalism, and clarity. It should sound understanding, \n",
              "polite, and be succinct. \n",
              " \n",
              "Evaluation Steps:\n",
              "[\n",
              "    \"From the Context and Input, identify the user’s intent, needs, tone, and any constraints or specifics to be \n",
              "addressed.\",\n",
              "    \"Verify the Actual Output directly responds to the Input, uses relevant details from the Context, and remains \n",
              "consistent with any constraints.\",\n",
              "    \"Evaluate empathy: check whether the Actual Output acknowledges the user’s situation/feelings from the \n",
              "Context/Input in a polite, understanding way.\",\n",
              "    \"Evaluate professionalism and clarity: ensure respectful, blame-free tone and concise, easy-to-understand \n",
              "wording; choose the response that best balances empathy, professionalism, and succinct clarity.\"\n",
              "] \n",
              " \n",
              "Winner: GPT-4\n",
              " \n",
              "Reason: GPT-4 delivers a single, concise, and professional email that directly addresses the context (acknowledges \n",
              "receiving a keyboard instead of the ordered wireless mouse), apologizes, and clearly outlines next steps (send the \n",
              "correct mouse and provide return instructions) with a polite verification step (requesting a photo). This best \n",
              "matches the request to write a response and balances empathy and clarity. In contrast, Gemini includes multiple \n",
              "options with meta commentary, which dilutes focus and fails to provide one clear reply; while empathetic and \n",
              "detailed (e.g., acknowledging frustration and offering prepaid labels), the multi-option format and an \n",
              "over-assertive claim of already locating the order reduce professionalism and succinct clarity compared to GPT-4.\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "======================================================================\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">======================================================================\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'GPT-4'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 79
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The evaluation results show that GPT-4 outperformed the other model in generating a support email that balanced empathy, professionalism, and clarity. GPT-4’s response stood out because it was concise, polite, and action-oriented, directly addressing the situation by apologizing for the error, confirming the issue, and clearly explaining the next steps to resolve it, such as sending the correct item and providing return instructions. The tone was respectful and understanding, aligning perfectly with the user’s need for a clear and empathetic reply. In contrast, Gemini’s response, while empathetic and detailed, included multiple response options and unnecessary commentary, which reduced its clarity and professionalism. This result highlights GPT-4’s ability to deliver focused, customer-centric communication that feels both professional and considerate."
      ],
      "metadata": {
        "id": "VkkPH4Od2mH3"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Visualizing the responses in a table"
      ],
      "metadata": {
        "id": "fX39V7lQ2qMb"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "pd.set_option('display.max_colwidth', None)\n",
        "\n",
        "df = pd.DataFrame({\n",
        "    \"Model\": [\"Gemini\", \"OpenAI\"],\n",
        "    \"Response\": [geminiResponse, openAI_response]\n",
        "})\n",
        "df"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 320
        },
        "id": "xb6K1wK_mV7m",
        "outputId": "c63e5253-fc69-4a81-f4a5-e62e868883a8"
      },
      "execution_count": 81,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "    Model  \\\n",
              "0  Gemini   \n",
              "1  OpenAI   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Response  \n",
              "0  Of course. Here are a few options, ranging from standard and professional to more proactive.\\n\\n### Option 1: Standard & Professional (Most Common Choice)\\n\\nThis is a clear, polite, and effective response that solves the problem efficiently.\\n\\n**Subject: Re: Your Recent Order - Incorrect Item Received**\\n\\nDear [Customer Name],\\n\\nThank you for contacting us and for bringing this to our attention.\\n\\nI am very sorry to hear that you received a keyboard instead of the wireless mouse you ordered. I understand this is frustrating, and I sincerely apologize for our mistake.\\n\\nWe want to get the correct item to you as quickly as possible. To help me locate your order, could you please reply with your order number?\\n\\nOnce I have that, I will immediately ship the correct wireless mouse to you. I will also email you a pre-paid return label so you can send the keyboard back to us at no cost.\\n\\nWe appreciate your patience and look forward to resolving this for you.\\n\\nBest regards,\\n\\n[Your Name]\\nCustomer Support Team\\n\\n---\\n\\n### Option 2: Proactive & Customer-Focused\\n\\nThis option goes a step further by shipping the new item immediately, showing extra trust in the customer and prioritizing speed.\\n\\n**Subject: Getting Your Correct Order Shipped Immediately**\\n\\nDear [Customer Name],\\n\\nThank you for your email. I am so sorry about the mix-up with your recent order. Receiving the wrong item is definitely not the experience we want for our customers, and we apologize for the inconvenience.\\n\\nI have already located your order based on your email address and have arranged for the correct wireless mouse to be shipped to you today via express shipping. You should receive a separate email with the new tracking information shortly.\\n\\nRegarding the keyboard you mistakenly received, there's no rush. I've attached a pre-paid return label to this email. Please feel free to use the original packaging to send it back to us at your convenience.\\n\\nAgain, we sincerely apologize for the error. If there is anything else I can help you with, please don't hesitate to ask.\\n\\nSincerely,\\n\\n[Your Name]\\nCustomer Support Team\\n\\n---\\n\\n### Option 3: Concise & Friendly\\n\\nThis version is slightly less formal and gets straight to the point, which can be effective for some brands.\\n\\n**Subject: Re: Wrong item in your order**\\n\\nHi there,\\n\\nThanks for reaching out, and I'm so sorry we sent you a keyboard instead of a mouse! Let's get that fixed for you right away.\\n\\nCould you please reply with your order number?\\n\\nAs soon as I have it, I'll get the correct wireless mouse shipped out to you and send you a free return label for the keyboard.\\n\\nApologies again for the mix-up!\\n\\nAll the best,\\n\\n[Your Name]\\nCustomer Support  \n",
              "1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Subject: Re: Incorrect Item Received – Order Support\\n\\nDear [Customer Name],\\n\\nThank you for reaching out to us, and I apologize for the mix-up with your order.\\n\\nWe’re sorry you received a keyboard instead of the wireless mouse you ordered. To resolve this as quickly as possible, could you please reply to this email with a photo of the item you received? Once we have this, we will arrange for the correct mouse to be sent to you right away and provide instructions for returning the incorrect item.\\n\\nThank you for your patience, and we look forward to resolving this for you.\\n\\nBest regards,  \\n[Your Name]  \\nCustomer Support Team  \\n[Company Name]  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-28cefe74-890f-4484-a4fa-aa4f196def34\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Response</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Gemini</td>\n",
              "      <td>Of course. Here are a few options, ranging from standard and professional to more proactive.\\n\\n### Option 1: Standard &amp; Professional (Most Common Choice)\\n\\nThis is a clear, polite, and effective response that solves the problem efficiently.\\n\\n**Subject: Re: Your Recent Order - Incorrect Item Received**\\n\\nDear [Customer Name],\\n\\nThank you for contacting us and for bringing this to our attention.\\n\\nI am very sorry to hear that you received a keyboard instead of the wireless mouse you ordered. I understand this is frustrating, and I sincerely apologize for our mistake.\\n\\nWe want to get the correct item to you as quickly as possible. To help me locate your order, could you please reply with your order number?\\n\\nOnce I have that, I will immediately ship the correct wireless mouse to you. I will also email you a pre-paid return label so you can send the keyboard back to us at no cost.\\n\\nWe appreciate your patience and look forward to resolving this for you.\\n\\nBest regards,\\n\\n[Your Name]\\nCustomer Support Team\\n\\n---\\n\\n### Option 2: Proactive &amp; Customer-Focused\\n\\nThis option goes a step further by shipping the new item immediately, showing extra trust in the customer and prioritizing speed.\\n\\n**Subject: Getting Your Correct Order Shipped Immediately**\\n\\nDear [Customer Name],\\n\\nThank you for your email. I am so sorry about the mix-up with your recent order. Receiving the wrong item is definitely not the experience we want for our customers, and we apologize for the inconvenience.\\n\\nI have already located your order based on your email address and have arranged for the correct wireless mouse to be shipped to you today via express shipping. You should receive a separate email with the new tracking information shortly.\\n\\nRegarding the keyboard you mistakenly received, there's no rush. I've attached a pre-paid return label to this email. Please feel free to use the original packaging to send it back to us at your convenience.\\n\\nAgain, we sincerely apologize for the error. If there is anything else I can help you with, please don't hesitate to ask.\\n\\nSincerely,\\n\\n[Your Name]\\nCustomer Support Team\\n\\n---\\n\\n### Option 3: Concise &amp; Friendly\\n\\nThis version is slightly less formal and gets straight to the point, which can be effective for some brands.\\n\\n**Subject: Re: Wrong item in your order**\\n\\nHi there,\\n\\nThanks for reaching out, and I'm so sorry we sent you a keyboard instead of a mouse! Let's get that fixed for you right away.\\n\\nCould you please reply with your order number?\\n\\nAs soon as I have it, I'll get the correct wireless mouse shipped out to you and send you a free return label for the keyboard.\\n\\nApologies again for the mix-up!\\n\\nAll the best,\\n\\n[Your Name]\\nCustomer Support</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>OpenAI</td>\n",
              "      <td>Subject: Re: Incorrect Item Received – Order Support\\n\\nDear [Customer Name],\\n\\nThank you for reaching out to us, and I apologize for the mix-up with your order.\\n\\nWe’re sorry you received a keyboard instead of the wireless mouse you ordered. To resolve this as quickly as possible, could you please reply to this email with a photo of the item you received? Once we have this, we will arrange for the correct mouse to be sent to you right away and provide instructions for returning the incorrect item.\\n\\nThank you for your patience, and we look forward to resolving this for you.\\n\\nBest regards,  \\n[Your Name]  \\nCustomer Support Team  \\n[Company Name]</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-28cefe74-890f-4484-a4fa-aa4f196def34')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-28cefe74-890f-4484-a4fa-aa4f196def34 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-28cefe74-890f-4484-a4fa-aa4f196def34');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    <div id=\"df-8d37b4d0-2fbd-483a-991e-07732975e656\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8d37b4d0-2fbd-483a-991e-07732975e656')\"\n",
              "                title=\"Suggest charts\"\n",
              "                style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "      <script>\n",
              "        async function quickchart(key) {\n",
              "          const quickchartButtonEl =\n",
              "            document.querySelector('#' + key + ' button');\n",
              "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "          try {\n",
              "            const charts = await google.colab.kernel.invokeFunction(\n",
              "                'suggestCharts', [key], {});\n",
              "          } catch (error) {\n",
              "            console.error('Error during call to suggestCharts:', error);\n",
              "          }\n",
              "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "        }\n",
              "        (() => {\n",
              "          let quickchartButtonEl =\n",
              "            document.querySelector('#df-8d37b4d0-2fbd-483a-991e-07732975e656 button');\n",
              "          quickchartButtonEl.style.display =\n",
              "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "        })();\n",
              "      </script>\n",
              "    </div>\n",
              "\n",
              "  <div id=\"id_465b9822-d5e5-4025-a3eb-809f5a2bba99\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_465b9822-d5e5-4025-a3eb-809f5a2bba99 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df",
              "summary": "{\n  \"name\": \"df\",\n  \"rows\": 2,\n  \"fields\": [\n    {\n      \"column\": \"Model\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"OpenAI\",\n          \"Gemini\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Response\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Subject: Re: Incorrect Item Received \\u2013 Order Support\\n\\nDear [Customer Name],\\n\\nThank you for reaching out to us, and I apologize for the mix-up with your order.\\n\\nWe\\u2019re sorry you received a keyboard instead of the wireless mouse you ordered. To resolve this as quickly as possible, could you please reply to this email with a photo of the item you received? Once we have this, we will arrange for the correct mouse to be sent to you right away and provide instructions for returning the incorrect item.\\n\\nThank you for your patience, and we look forward to resolving this for you.\\n\\nBest regards,  \\n[Your Name]  \\nCustomer Support Team  \\n[Company Name]\",\n          \"Of course. Here are a few options, ranging from standard and professional to more proactive.\\n\\n### Option 1: Standard & Professional (Most Common Choice)\\n\\nThis is a clear, polite, and effective response that solves the problem efficiently.\\n\\n**Subject: Re: Your Recent Order - Incorrect Item Received**\\n\\nDear [Customer Name],\\n\\nThank you for contacting us and for bringing this to our attention.\\n\\nI am very sorry to hear that you received a keyboard instead of the wireless mouse you ordered. I understand this is frustrating, and I sincerely apologize for our mistake.\\n\\nWe want to get the correct item to you as quickly as possible. To help me locate your order, could you please reply with your order number?\\n\\nOnce I have that, I will immediately ship the correct wireless mouse to you. I will also email you a pre-paid return label so you can send the keyboard back to us at no cost.\\n\\nWe appreciate your patience and look forward to resolving this for you.\\n\\nBest regards,\\n\\n[Your Name]\\nCustomer Support Team\\n\\n---\\n\\n### Option 2: Proactive & Customer-Focused\\n\\nThis option goes a step further by shipping the new item immediately, showing extra trust in the customer and prioritizing speed.\\n\\n**Subject: Getting Your Correct Order Shipped Immediately**\\n\\nDear [Customer Name],\\n\\nThank you for your email. I am so sorry about the mix-up with your recent order. Receiving the wrong item is definitely not the experience we want for our customers, and we apologize for the inconvenience.\\n\\nI have already located your order based on your email address and have arranged for the correct wireless mouse to be shipped to you today via express shipping. You should receive a separate email with the new tracking information shortly.\\n\\nRegarding the keyboard you mistakenly received, there's no rush. I've attached a pre-paid return label to this email. Please feel free to use the original packaging to send it back to us at your convenience.\\n\\nAgain, we sincerely apologize for the error. If there is anything else I can help you with, please don't hesitate to ask.\\n\\nSincerely,\\n\\n[Your Name]\\nCustomer Support Team\\n\\n---\\n\\n### Option 3: Concise & Friendly\\n\\nThis version is slightly less formal and gets straight to the point, which can be effective for some brands.\\n\\n**Subject: Re: Wrong item in your order**\\n\\nHi there,\\n\\nThanks for reaching out, and I'm so sorry we sent you a keyboard instead of a mouse! Let's get that fixed for you right away.\\n\\nCould you please reply with your order number?\\n\\nAs soon as I have it, I'll get the correct wireless mouse shipped out to you and send you a free return label for the keyboard.\\n\\nApologies again for the mix-up!\\n\\nAll the best,\\n\\n[Your Name]\\nCustomer Support\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 81
        }
      ]
    }
  ]
}